Interpreting Unit Segmentation of Conversational Speech in Simultaneous Interpretation Corpus

نویسندگان

  • Zhe DING
  • Koichiro RYU
  • Shigeki MATSUBARA
  • Masatoshi YOSHIKAWA
چکیده

The speech-to-speech translation system is becoming an important research topic with the progress of the speech and language processing technology. Considering efficiency and the smoothness of the cross-lingual conversation, the simultaneity of the translation processing has a great influence on the performance of the system. This paper describes interpreting unit segmentation of conversational bilingual speech in simultaneous interpretation corpus which has been developed in Nagoya University. By finding the segmentation point of spoken utterances in the speech corpus manually, we identified a clause-unit as a practical interpreting unit. In this paper, we examined the availability of such unit, and segmented spoken dialogue sentences into interpreting units. A large-scale bilingual corpus for which the interpreting units are provided can be used for the simultaneous machine interpretation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation

Abstract With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting r...

متن کامل

Construction and utilization of bilingual speech corpus for simultaneous machine interpretation research

This paper describes the design, analysis and utilization of a simultaneous interpretation corpus. The corpus has been constructed at the Center for Integrated Acoustic Information Research (CIAIR) of Nagoya University in order to promote the realization of the multi-lingual communication supporting environment. The size of transcribed data is about 1 million words, and the corpus would deserve...

متن کامل

Collection of Simultaneous Interpreting Patterns by Using Bilingual Spoken Monologue Corpus

This paper provides an investigation of simultaneous interpreting patterns using a bilingual spoken monologue corpus. 4,578 pairs of English-Japanese aligned utterances in CIAIR simultaneous interpretation database were used. This investigation is the largest scale as the observation of simultaneous interpreting speech. The simultaneous interpreters are required to generate the target speech si...

متن کامل

Bilingual Spoken Monologue Corpus for Simultaneous Machine Interpretation Research

Abstract This paper describes a large-scale bilingual corpus of spoken monologues and their simultaneous interpretation, which has been constructed at CIAIR. The corpus has the following characteristics: (1) English and Japanese speeches are recorded in parallel, (2) the data contains monologue speeches such as lecture and self-introduction, and (3) the exact beginning and ending times are prov...

متن کامل

Recognition and Understanding of Meetings

This paper is about interpreting human communication in meetings using audio, video and other signals. Automatic meeting recognition and understanding is extremely challenging, since communication in a meeting is spontaneous and conversational, and involves multiple speakers and multiple modalities. This leads to a number of significant research problems in signal processing, in speech recognit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005